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Abstract: A new line of research [5] on the lasso [8] exploits the beautiful 
geometric fact that the lasso fit is the residual from projecting the response 
vector y onto a certain convex polytope [10]. This geometric picture also 
allows an exact geometric description of the set of accessible lasso models 
for a given design matrix, that is, which configurations of the signs of the 
coefficients it is possible to realize with some choice of y. In particular, the 
accessible lasso models are those that correspond to a face of the convex 
hull of all the feature vectors together with their negations. This convex hull 
representation then permits the enumeration and bounding of the number 
of accessible lasso models, which in turn provides a direct proof of model 
selection inconsistency when the size of the true model is greater than half 
the number of observations. 
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1. Introduction 

The lasso [8] has proved a popular approach in high dimensional regression 
problems, where the number of variables is large relative to the number of 
observations. Perhaps the major reason for this is that the estimated coefficients 
are sparse. Indeed, as noted first by [7], the number of nonzero coefficients can 
never exceed the number of observations. 

Let us consider a signed model to be a subset of supported variables together 
with the signs of their corresponding coefficients. For a fixed design matrix, it 
is not in fact the case that any signed model with n or fewer variables could 
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possibly be chosen by the lasso for some choice of y. This differs from ordinary 
least squares, where all such models are possible. 

In this paper, we will describe and count the signed models that are accessible, 
that is, which can possibly be chosen by the lasso. 

More formally, for a fixed design matrix Xq G the lasso solves the 

optimization problem 

/3™iRp “ ^0/3o||2 + Alltel |l (1-1) 


for fixed A > 0. 

Under the relatively weak condition that Xq is in general position, [9] showed 
that the minimizing Pq is unique for any choice oi y G R"; he further showed 
that this condition holds almost surely, for example, if Xq is chosen from any 
continuous probability distribution on For the remainder of this work, we 

will suppose that Xq is such that /Sq is unique for any choice of y, and denote 
this by /3Q{y). 

Since we are interested in the signs of /?o, it will be notationally more conve¬ 
nient to solve the following equivalent problem: 


min 

;9eR2P 

subject to 


1 

-\\y-X(3\\l + Xj2l3j 

i=i 

/3 > 0. 


( 1 . 2 ) 


Here, X = (Xq, —ACq), and /3 = [Pq , Pq). Note that the solution to (1.2) will 
have at most one of Pj and Pp+j being nonzero. This is because if both are 
positive, one can subtract a common factor from both of them, reducing the 
sum of the /I’s without changing the residual sum of squares. 

We’ll formalize the notion of a signed model by letting supp(/3) = {1 < j < 
2p : Pj > 0}; of course, supp(/3) is just an alternate description of the support 
and signs of Pq. For any signed model S C {1,..., 2p}, let Xs consist of just the 
columns of X given by S. Similarly, let Ig denote a vector of length IS”! with all 
ones. 

Of great interest will be the values of y leading to a particular signed model; 
define 

= {y e R" : supp(/3(?/)) = S} (1.3) 

Clearly, As = R", forming a partitioning of the space into disjoint regions. 
We then call a signed model -S' C {1 < j < 2p} accessible if As ^ 0; that is, if 
there is some value of y such that supp(/3(?/)) = S. 


2. Geometry of lasso model selection 

As we will see, the set Afj, or the “null model polytope”, plays a special role 
in the geometry of the lasso. Another set which will play an equally important 
role is the convex hull of the columns of X, which we will denote by C'H{X). 
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In fact, Aid/X and C'H{X) are (by definition) polar duals, since A® = {y G R" : 
< Al} by the KKT conditions for (1.1). 

Each face of CHiX) has a natural correspondence to a subset of signed vari¬ 
ables S; namely, the signed variables S that form the vertices of that face. 
We denote by Fs the corresponding face of A®, which is characterized by the 
property that Xgf = Xls for any / G Fs- 

The sets As are then characterized by the following lemma, which we believe 
was described and proved first by [10] in Lemma 3: 

Lemma 2.1 (Determining supp(/3(2/))). Let PAgi[y) he the projection of y onto 
the null model polytope Ag. Let Fs be the face of A^ of minimal dimension such 
that Py 4 g(y) G Fs- Then y G Furthermore, y — PAf,{y) = Xsl3{y), the lasso 
fit. 

In the special case where Xq is orthogonal, it is well known that the lasso 
coefficients are equal to the A-soft-thresholded inner products between y and 
columns of X. When p = n, this orthogonal design case corresponds to Ag^ 
being a hypercube with the origin as its center. (For p < n, it looks more like 
a “cylinder” built over a p-dimensional hypercube). Lemma 2.1 is the natural 
extension of this soft-thresholding phenomenon to general design matrices; the 
regularization done by the lasso is to “remove” the projection onto Ag^. 

In fact. Lemma 2.1 gives a complete characterization of the sets As, as illus¬ 
trated in Figure 1, taken from [4]. 

Corollary 2.1 (Geometry of Partitions Ag). If As ^ 0, then it is the equal to 
Fs + {Xsa : a > 0} = {/ -I- Xsa : f G Fs, a > 0}. 

Proof. Suppose y G Ag. Then by Lemma 2.1, y — PAf,iy) = Xsl3{y), so y = 
^0(y) + XsPiy) &Fs + {Xsa : a > 0}. Thus, As C Fs + [Xsa : a > 0}. 

Since [J^ As = R”, and As C Fs + {Xsa : a > 0}, we need only show that 
the sets {Fs -I- {Xsa : a > 0})^ are disjoint. 

So consider fs+Xsas and fx + Xxar, where fs G Fs and as > 0, (respec¬ 
tively for T). Then 


ll/s + Xsas - (/t + XTar)]]'^ 

= ll/s - /tIP + 2(/s - fT,Xsas - Xxax) + H-^s^s - 
> 2(/s - / t , Xsas - Xxar), 

where the strict inequality is from Xsas X^ax when S ^ T and as, ax > 0. 
Thus, 


ll/s + Xsas - {fx + Xxax)\\'^ 

> 2 (ag(Als - Xg fx) + ax{Xlx - Xxfs)) 

> 0 , 
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where X^Jt < Alg and X^fs < AIt (coordinate-wise) by evaluating the KKT 
conditions at the points fr and fs in Ag. 

Thus, fs~^Xso:s ^ ]t~^Xj'OltI so that Fs-\-{Xsa : a > 0} and Ft-\-{Xto: : 
a > 0 } are disjoint. □ 

We remark that Corollary 2.1 immediately shows that the sets As are con¬ 
vex and have linear boundaries. It also shows that all nonempty sets Ag are 
unbounded, with the possible exception of A 0 . 



Fig 1. Example of sets As C with A = 1.7. xi, X 2 , and xs are the columns of Xq E 
each of which has length 1. The middle white region is the polytope A^, the red regions are 
the sets ^5 with |S| = 1, and the blue regions those with |S| = 2. Each region is labeled with 
sgn{l3o{y)); for example, “+0-” means that the xi coefficient is positive, the X 2 coefficient is 
zero, and the X 3 coefficient is negative. 


Now for any face Fs of A 0 , there is a point fs £ Fs such that Fs is the lowest 
dimensional face containing fs- Since PA^ifs + Xsa) = fs for any a > 0, S' is 
in fact an accessible lasso model by Lemma 2.1. Thus, the set of all accessible 
models is determined by the faces of A^, or alternatively by the faces of C'H{X): 

Theorem 2.1 (Characterization of Accessible Models). As ^ 0 and only if 
the variables Xs form a face of C'H{X). 

Proof. From Lemma 2.1, As 7^ 0 if and only if the variables Xs is the KKT- 
active set corresponding to the face Fs of A 0 . By the polar duality between AA 0 
and C'H{X), this occurs if and only if the variables Xs form a face of C'H{X). □ 

We feel this is an intuitively believable result: The lasso attempts to con¬ 
struct a parsimonious approximation of y using a positive linear combination of 
some of the columns of A. If it selected a subset of columns of X that did not 
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generate a face of C'H{X), then intuitively it should be easier to generate a more 
parsimonious approximation using variables that do form a face of C'H{X). 

3. Bounding the number of accessible lasso models 

Having established that the accessible models correspond to faces of C'H{X), 
we can now apply classical results from polytope theory to count and bound 
the number of accessible models. The main result in this field is the celebrated 
upper bound theorem of [6] , which provides a tight upper bound on the number 
of faces of dimension fc of a polytope in d dimensions. 

To introduce the upper bound theorem, we will need the notion of a cyclic 
polytope: A cyclic polytope with v vertices in d dimensions is the convex hull of 
V points i € {1,... ,u}}, (and t, ^ tj for any i, j). 

Theorem 3.1 (Upper Bound Theorem). A convex polytope with v vertices in 
d dimensions has no more than fk{d,v) faces of dimension k, where fk{d,v) 
is the number of faces of dimension k of a cyclic polytope with v vertices in d 
dimensions. 

In particular, fk{d, v) = 0 < fc < [|J, and in general is 



(3.1) 


where 6 is the parity of d. 

The equation for fk{d,v) derives from the Dehn-Sommerville equations, and 
may be found, for example, in Theorem 3 of [3]. 

The upper bound theorem immediately allows us to bound the number of 
accessible lasso models of size k (size of a model is the number of non-zero 
coefficients): 

Theorem 3.2. The total number of accessible lasso models of size k is no more 
than fk-i{n, 2p). 

Proof. By Theorem 2.1, the accessible lasso models of size k are exactly those 
whose associated feature vectors form a (fc — l)-dimensional face of C'H{X). 
C'H{X) is a polytope in n dimensions with at most 2p vertices, so by Theorem 
3.1 it has no more than fk-i{n, 2p) (fe — l)-dimensional faces. □ 

By summing over McMullen’s upper bound for faces of each dimension, we 
can also bound the total number of faces of a convex polytope of all dimensions, 
and hence, the total number of accessible lasso models: 

Lemma 3.1. For v > 2d, the total number of faces of a convex polytope with v 
vertices in dimension d is bounded by 



(3.2) 




Harris and Sepehri/The Accessible Lasso Models 


6 


Proof. Summing (3.1) over k gives a tight upper bound on the total number 
of faces of all dimensions. Since the terms in the sum (3.1) are nonzero only if 
j P k/2, we may drop all other terms in that sum; 


d-l 


'^fk{d,v) 


/c=0 

^ ^ v-S{v-k-2) 

“ ^ V — k — 1 ^ 

fc=o i>fc/2 

d—l j L^/2J 

E v — d{v — k — 2) 

V — k — 1 ^ 

fc=o i>fc/2 

d—l J-/ , j-j'. L'^/2j 

V — d{v — k — 2) 




E 


V — I d/2j 

k=0 j>k/2 

c/ 7 I \ d—l \_d/2\ 

^ n — d{v — a — \) 


V — [d/2j 
V — S{v — d—l) 


k—0 j'>kj2 
d-l [d/2\ 

V — \d/2\ E E 

L ' J k=0 j>k/2 


/?;-l-j\/ v-k-1 \ 

\k + l- j) V2j - fc - 1 + Sj 

( v — j \v — k — 1 / V — k — 1\ 
\v — k — 1J V — j \v — 2j — 6 J 


( ^ \/'y-/c-l\ 

\v—k—lj\v —2j — 6/ 

( ^ \ f V - k - 1\ 

\v — k—l)\v —2j — d) 

f V- j \/ j+ 5 \ 
\v - 2j -Sj\k-j + l)' 


using the identity (^) (®) = ()^) (((_*) ■ Exchanging the order of summation gives 


^ V - S{v - d - 1) V v-j \ / j + d \ 

^ v-d{v-d-l) V- j \ 


For V > 2d, the terms in the sum are nondecreasing in j, so we may bound the 
sum by 


... < 


V — 6{v — d—l) 

V — [d/2j 


Ld/2J 


< 


V — S{v — d—l) 
V — [d/2j 


Ld/2J 


- L'^/2j^2rd/2i 
f jv- L^/2j)e \ ^d/ 2 ] 

I M/21 ) 


< 2e{d+l) 


( 2e{v- M/2J)) \ 

'v M/21 ) 
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using the fact that < (x)^ inequality and v < 2{v — d/2) in 

the last inequality, (for even d). □ 

Now, the number of accessible lasso models is exactly equal to the number 
of non-empty faces of CH^X), a convex polytope in n dimensions with at most 
2p vertices. Thus, the following corollary is immediate: 


Corollary 3.1. For p > n, the total number of accessible lasso models is 
bounded by 


2e{n -I- 1) 


/ 2e(2p-Ln/2j)) y"/^J 

1 r ^/21 ; 


In the setting where p ~ pn, this bound is 


O i n ( 4e(2p 



Naively, for n < p, one would bound the total number of lasso models by 
Y7k=o setting p ^ pn, this is at least 

(2pe)" . 

Our bound gives 4e(2p — < (v'Spe)" <C (2pe)”, cutting down the 

number of models exponentially from the naive bound. 

C'H{X) is no ordinary convex polytope; it also exhibits symmetry through 
the origin. One might expect this special structure to improve the bound on 
the number of faces. However, results of [1] provide evidence suggesting that 
dramatic improvements to the upper bound theorem for centrally symmetric 
polytopes are unlikely. 


4. Model selection inconsistency 

The bound on the number of accessible lasso models implies that correctly 
selecting large true models is impossible: Suppose we have a sequence of lasso 
problems of the form (l.f), with n — > oo, and Pn/'ci —>■ p- Suppose that for each 
n, there is a “true model” S'™, of size kn = l^nl, with kn/n —>■ k. 

Theorem 4.1. Suppose, for each n, that the entries of Xq are distributed iid 
according to some symmetric distribution on R. Then if k > i, for any p > —^ 

we have P{As„^ = 0) —>■ 1. 

Proof. Since the entries of Xq are iid and symmetric, all the signed models 
of size k are equally likely to be a face of C'H{X). Thus, the probability that 




Harris and Sepehri/The Accessible Lasso Models 


^Sn 0 depends only on fe = | S'„ |. This probability is 




< 


< 


E accessible models of size k] 


nt) 


fkjn, 2p) 
*0 


2p y^L"; 
— k—1 ^3—' 


2p 


Ln/2J 


j=0 V k+l-j 


2p-l-j\ (2p-k-l\ 
-3 ) V 27-fc J 


2‘ffl 


(4.1) 


First of all, observe that the terms in the sum are zero unless j > f- Next, we’ll 
show that that (t+i'r/) 

j = [yj. To see this, define aj = consider 


is increasing in j , and therefore maximized at 


Qj+i _ (fc - j + l)(2p - 2j - l){2p - 2j - 2) 

0-3 (2p - j - l)(2j - fc + 2)(2j - A: + 1) 

{k — n/2 + l)(2p — n — l)(2p — n — 2) 

~ (2p — k/2 — l)(n — k + 2){n — fc + 1) 

_ {kn/n - 1/2 + ljn){2pnln - 1 - l/n){2pn/n - 1-2/n) 
{2pnln - knl{2n) - l/n)(l - fc„/n + 2/n)(l - A:„/n + 1/n) 
> (k - 1/2 - e)(2p- 1 - e)^ 

~ (2p — k/2 + e)(l — K + e)2 

= e)- 


If we have K{k,p,0) > 1, then we may choose e small enough such that 
K{K,p,e) > 1 as well, and then the term are increasing in j for sufficiently 
large n. In fact, a sufficient condition for K{k, p, 0) > 1 is that p > so the 
terms are in fact increasing. 

We can thus bound each of the terms in the numerator of (4.1) by the max¬ 
imum, yielding the bound 


-k\ /2p - 1- f 


fc -|- 1 — 


2p — fc — I 
n — k 


(4.2) 


for the sum in the numerator of (4.1). Naturally, we substitute the bound (4.2) 
in for (4.1), yielding: 


P{As^ 0) < 

< 


2p ( n-k \ /'2p-l-fW2p-fc-l\ 

2p—k — l V 2 / \ fc+1—/ V n—k ) 

2hS 

2/o+e f n(l —K,+g) \ /(2p—|+e)nw(2p—K+e)n\ 
2p—K—€ y 2 J V (K_i+e)n A (1 —K+e)n / 

n(K — e)n( 

\{K—e)n) 


(4.3) 


for any choice of e > 0 and sufficiently large n. 
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The expression above involves terms of the form (, which we may approx¬ 
imate using Stirling’s formula as follows: 


bn 


= (1 + 0 ( 1 )) 


= ( 1 + 0 ( 1 )) 


'J2t: ane “"(an)" 


\/2Trbne ^"(tin)^"\/27r(a — b)ne — b)nY°- 


27r6(a — b)n \b^{a — b)°-~^ 


(4.4) 


Substituting the approximation (4.4) into each of the three binomial terms 
in (4.3) yields: 


P{As 0) < O(v^) 




(K-i-|-£)'‘"2+'(2p-K)^'’- 




( 2 p-K-|-e)^P-”+" 

(1 —K+e)^“'^+^(2p—l)2p- 


^)’ 


— {p—k)P' 


A 


= 0{y/n)C{p,K,eY 


Thus, P{As 7 ^ 0) —>■ 0 if C(p, k, e) < 1. We may choose e arbitrarily small, 
and so in fact the same holds if 


C{p,K,0) = 


{2p-^Yp-H!inp-ny-^ 

(k — (1 — Ky~^(2p — l)2p-ipP 


< 1 . 


For any k > 1/2 there exists a po such that for p > po we have C{p,k) < 1. 
Concretely, po = —yr satisfies this. □ 


Of course, one might consider it overly stringent to require that the specific 
true model S be accessible; one might be happy even if some “similar” model to 
the true model were accessible. However, Theorem 4.1 combined with a simple 
union bound shows that this is impossible even for reasonably close models. 


Corollary 4.1. Consider the same sequence of lasso problems as in Theorem 
4-.1. Let PyS) = {T C {1,..., 2p} : |S'AT| < en}. Then for p and k such that 
p > there exists e = e(p, k) > 0 such that 


P 


( 


U 

TPB e (S„) 



-A 0 . 


Proof. We have that 

|B£/logn(5n)| < (2p)ISiW 
= {2pn) 

= (e'°gOpn)^isf^ 


= gelog(2p)jj^g<;„ 
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Note that C(p, k, 0) is a continuous function, hence for p and n such that 
C{p, K,0) < 1, we can choose n large enough such that perturbing n by to 

k' keeps C{p, k', 0) less than 1 — (5, for small enough S. Then by a simple union 
bound we get 

PI U At^%\ 

Thus we can choose e small enough such that e^'^(l — (5) < 1, which makes the 
right hand side go to zero as n goes to infinity. □ 

Remark. It is important to note the difference between a model being acces- 
sibe and the same model being selected by lasso. Accessibility is a property of 
the model and the design matrix. On the other hand, for a model being selected 
not only do we need it to be accessible but we also need the response vector 
to lie in a certain subset of the space. This means that being selected is more 
stringent than accessibility. Our argument for model selection inconsistency is 
based on non-accessibility of most of large models, and for that reason it is a 
sub-optimal result. In fact, Corollary 7.1. in [2] implies that we can not improve 
drastically upon Theorem 4.1 to relax the assumption k > 1/2. Using notation, 
and in the context, of Theorem 4.1 it asserts that we can not assume a lower 
bound than (2elog(p))“^ on k smaller, when entries of the design matrix are 
independent Gaussian variables. 

5. Simulations 

This sections shows results from simulation studies of the model selection con¬ 
sistency for lasso. We follow the setup from the previous section assuming that 
Pn/n —i" p > 1, and /c„/n -A k. We consider random design matrix with iid 
entries as well as independent observations from correlated features. 

We consider the normalized Hamming distance between the fitted signed 
model and the true signed model as a measure of mis-selection. In particular, 
relative selection error is defined as where ||.||o is the number of non-zero 

elements and /3 is a signed model compatible with the notation of 1.2. Figure 2 
illustrates the selection error along a lasso path. 

First, we examine the results in a setup where the conditions for our theorems 
hold. Assume we have n = 100 observations of p = 120 variables, and a response 
variable. The coefficients vector /3 has k randomly selected entries equal to 
10, and the rest of entries equal to zero, k takes values 5,10, 20, 30,45, 60. For 
each instance of the model we search over the entire lasso path to find the 
closest possible lasso model to the true model. We then report the normalized 
hamming distance between the best lasso model and the true model as the 
relative selection error. That is , where /3 is the closest lasso model and (3 

is the true model. For each n,p and, k we repeat this procedure 1000 time and 
show the histogram of the relative selection error of the best models selected 
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X 


Fig 2. Example of relative selection error along a lasso path. Lasso path is generated using 
the R package ‘glmnet’, onn = 1000 observations ofp = 1200 variables on a net oflOp values 
of X. The design matrix has iid Gaussian entries. The coefficients vector /S has f50 randomly 
selected entries equal to 10, and the rest of entries equal to zero. Noise is independent of 
the design matrix and has standard normal distribution. The blue line shows the minimum 
selection error achieved by a lasso selected model. 


by lasso. Figure 3 shows these histograms. As it can be seen, for small k, i.e. 
a sparser model, lasso can select a model pretty close to the true model. As 
k grows, the selection becomes worse. In particular, for k ^ n/2 lasso doesn’t 
select a model even close to the true model. 

We also try a similar simulation when the assumption of iid entries for the 
design matrix is relaxed to a more realistic condition. We assume having in¬ 
dependent observations of correlated features. Assume n = 100, p = 120, and 
k = 45. The design matrix consists of n observations from a p—variate normal 
distribution with mean zero, variance one, and all the pairwise correlations equal 
to p. In this study p takes on values 0, 0.1, 0.3,0.5,0.7,0.9. The rest of the setup 
is similar to that of the previous paragraph. This is illustrated in Figure 4. It 
suggests that the selection error becomes slightly worse when the features are 
correlated. 

To summarize, simulation studies are consistent with the theoretical results 
from previous section. They also suggest that the results of this paper, although 
stated in an asymptotic regime, are in agreement with simulations for relatively 
small values of n. We conclude this section with the following remark. 

Remark. It is worth mentioning that ideally one would try to numerically 
investigate accessibility of lasso models. This is computationally demanding, if 
not impossible, because it requires finding the convex hull of feature vectors in 
n dimensions. State of the art methods for computing the convex hull of 2p 
points in n dimensions require 0((2p)L"/^J) operations, which is prohibitive for 
relatively small n and impossible for real problems of interest. For this reason. 
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we focus on lasso selected models for simulation purposes. 


k = 


5 


k = 


20 



k = 


30 


k = 


60 



Fig 3. Histogram of relative selection error for n = 100, p = 120, and different sparsity level 
k. Each histogram is generated using 1000 repetitions. 



Fig 4. Histogram of relative selection error for n = 100, p = 120, k = 45, and different 
correlation parameters p. Each histogram is generated using 1000 repetitions. 
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6. Discussion 

We have characterized the geometry of the partitioning of the space R" of the 
response vector into the regions {^s}s corresponding to different accessibie 
signed modeis S. We have aiso showed that these accessibie modeis correspond 
to faces of the convex huii of the design matrix and its negative counterpart, 
C'H{X). We then used the upper bound theorem from poiytope combinatorics 
to bound the number of accessibie iasso modeis of each size, and then used this 
bound to directiy prove a modei seiection inconsistency resuit. 

It is worth mentioning that aii of our resuits hoid regardiess of the choice 
of the reguiarization parameter A. The geometric picture of C'H{X) does not 
depend on A, the set of accessibie modeis therefore does not depend on A either, 
and consequentiy the modei seiection inconsistency resuit hoids for any possibie 
choice of A, aigorithm to choose A, or even oracie who says what A ought to be. 

Our resuits show that modei seiection is impossibie when k > not oniy 
wiii the correct modei not be chosen, the design actuaiiy makes it impossibie 
to do so (with high probabiiity). This niceiy compiements seminai resuits on 
modei seiection for the iasso, by [13], which have the true modei size k growing 
as 0{n‘^), with c < 1. In this setup, [13] showed that an almost necessary and 
sufficient condition for model selection consistency is that the “irrepresentability 
conditions” hold on the design matrix. 

Another set of results, by [12], shows that model selection is impossible in 
the setup where the true model grows faster than 0(n/logn). We consider our 
model selection inconsistency theorem a geometric description of these results. 
These results were later generalized in [11] to get information-theoretic limits 
on sparsity recovery using a general decoder. 
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